-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Inference API _services retrieves authorization information directly from EIS #134398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Inference API _services retrieves authorization information directly from EIS #134398
Conversation
| components.add(httpClientManager); | ||
| components.add(inferenceStatsBinding); | ||
| components.add(authorizationHandler); | ||
| components.add(new PluginComponentBinding<>(Sender.class, elasicInferenceServiceFactory.get().createSender())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this I get a binding error when running
| ); | ||
|
|
||
| return new InferenceServiceConfiguration.Builder().setService(NAME) | ||
| .setName(NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't really necessary but I was running into a test failure because this field didn't exist until I switch the sorting field. Figured I'd leave the fix in though.
| @Override | ||
| public Set<TaskType> supportedStreamingTasks() { | ||
| return authorizationHandler.supportedStreamingTasks(); | ||
| return EnumSet.of(TaskType.CHAT_COMPLETION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously if the cluster wasn't authorized for chat completion we'd return a vague error about not being able to stream. With this change we'll allow the request to get sent to EIS and if it isn't authorized, EIS will return a failure.
…ner/elasticsearch into ml-eis-services-api-direct
…ervices-api-direct
|
Pinging @elastic/ml-core (Team:ML) |
| // This shouldn't be called because the configuration changes based on the authorization | ||
| // Instead, retrieve the authorization directly from the EIS gateway and use the static method | ||
| // ElasticInferenceService.Configuration#createConfiguration() to create a configuration based on the authorization response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a javadoc comment instead, so that the method you're referencing stays correct even if it's modified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Some minor comments
| super.setUp(); | ||
| // Ensure the mock EIS server has an authorized response ready before each test because each test will | ||
| // use the services API which makes a call to EIS | ||
| mockEISServer.enqueueAuthorizeAllModelsResponse(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The init() method below also calls mockEISServer.enqueueAuthorizeAllModelsResponse(). Is it equivalent to change the annotation on that method to @Before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the original issue for using @BeforeClass is because I ran into some weird issues locally. The original PR that added the @BeforeClass is here: #128640
For some background, when the node for the test starts up it will reach out to EIS and get the auth response. If that fails (there isn't a response queued in the mock server), then the tests will fail. What I was observing is that the base classes static logic would only be executed once regardless of how many subclasses used the base. This resulted in the first test class succeeding but the second test class that leveraged the base would fail. To get around this I added the @BeforeClass and it seemed to fix the issue. The reason we need this in @BeforeClass is because we need a response queued before Elasticsearch is started. Elasticsearch is started only once at the beginning before all the tests run.
| } | ||
| ); | ||
|
|
||
| getServiceConfigurationsForServices(availableServices, mergeEisConfigListener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getServiceConfigurationsForServices() is a synchronous method and could be written with a return type instead of a listener. I think that would make this code easier to read as you wouldn't need to define the merge listener
}).<List<InferenceServiceConfiguration>>andThen((configurationListener, authorizationModel) -> {
var serviceConfigs = getServiceConfigurationsForServices(availableServices);
if (authorizationModel.isAuthorized() == false) {
delegate.onResponse(serviceConfigs);
return;
}
if (requestedTaskType != null && authorizationModel.getAuthorizedTaskTypes().contains(requestedTaskType) == false) {
delegate.onResponse(serviceConfigs);
return;
}
var config = ElasticInferenceService.createConfiguration(authorizationModel.getAuthorizedTaskTypes());
serviceConfigs.add(config);
serviceConfigs.sort(Comparator.comparing(InferenceServiceConfiguration::getService));
delegate.onResponse(serviceConfigs);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂️ thank you, for some reason I thought it needed use a listener.
| .stream() | ||
| .filter( | ||
| service -> service.getValue().hideFromConfigurationApi() == false | ||
| // exclude EIS here because the hideFromConfigurationApi() is not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was slightly confused about "hideFromConfigurationApi() is not supported" in this comment
| // exclude EIS here because the hideFromConfigurationApi() is not supported | |
| // Exclude EIS as the EIS specific configurations are handled separately |
| configurationMap.put( | ||
| MODEL_ID, | ||
| new SettingsConfiguration.Builder( | ||
| EnumSet.of(TaskType.SPARSE_EMBEDDING, TaskType.CHAT_COMPLETION, TaskType.RERANK, TaskType.TEXT_EMBEDDING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this the intersection of enabledTaskTypes and the full set?
EnumSet.of(TaskType.SPARSE_EMBEDDING, TaskType.CHAT_COMPLETION, TaskType.RERANK, TaskType.TEXT_EMBEDDING).retainAll(enabledTaskTypes);
``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of task types here tells the UI for which task types this field should be configurable. That should stay the same regardless of whether the user is authorized for a specific task type. There's a top level field for task types that indicate which ones are authorized and that's set here:
return new InferenceServiceConfiguration.Builder().setService(NAME)
.setName(SERVICE_NAME)
.setTaskTypes(enabledTaskTypes) <--------
.setConfigurations(configurationMap)
.build();
This PR modifies the
_inference/_servicesAPI to make a call directly to EIS to get the authorization information to determine the configuration information to return in the API call.The authorization response will dictate whether we show EIS in the response and the task types that are included for the EIS configuration.
Follow up PRs will modify how authorization works in general but I'm going to try to do it in small chunks.